When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
Modeling perception sensors is key for simulation based testing of automated driving functions. Beyond weather conditions themselves, sensors are also subjected to object dependent environmental influences like tire spray caused by vehicles moving on wet pavement. In this work, a novel modeling approach for spray in lidar data is introduced. The model conforms to the Open Simulation Interface (OSI) standard and is based on the formation of detection clusters within a spray plume. The detections are rendered with a simple custom ray casting algorithm without the need of a fluid dynamics simulation or physics engine. The model is subsequently used to generate training data for object detection algorithms. It is shown that the model helps to improve detection in real-world spray scenarios significantly. Furthermore, a systematic real-world data set is recorded and published for analysis, model calibration and validation of spray effects in active perception sensors. Experiments are conducted on a test track by driving over artificially watered pavement with varying vehicle speeds, vehicle types and levels of pavement wetness. All models and data of this work are available open source.
translated by 谷歌翻译
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. It can save the cost of manually labeling data in real-world applications such as robot vision and autonomous driving. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. However, such an assumption does not always hold in practice owing to the collection difficulty and the scarcity of the data. Thus, we aim to relieve this need on a large number of real data, and explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization (OSDG) problem, where only one real-world data sample is available. To remedy the limited real data knowledge, we first construct the pseudo-target domain by stylizing the simulated data with the one-shot real data. To mitigate the sim-to-real domain gap on both the style and spatial structure level and facilitate the sim-to-real adaptation, we further propose to use class-aware cross-domain transformers with an intermediate domain randomization strategy to extract the domain-invariant knowledge, from both the simulated and pseudo-target data. We demonstrate the effectiveness of our approach for OSUDA and OSDG on different benchmarks, outperforming the state-of-the-art methods by a large margin, 10.87, 9.59, 13.05 and 15.91 mIoU on GTA, SYNTHIA$\rightarrow$Cityscapes, Foggy Cityscapes, respectively.
translated by 谷歌翻译
In unsupervised domain adaptation (UDA), a model trained on source data (e.g. synthetic) is adapted to target data (e.g. real-world) without access to target annotation. Most previous UDA methods struggle with classes that have a similar visual appearance on the target domain as no ground truth is available to learn the slight appearance differences. To address this problem, we propose a Masked Image Consistency (MIC) module to enhance UDA by learning spatial context relations of the target domain as additional clues for robust visual recognition. MIC enforces the consistency between predictions of masked target images, where random patches are withheld, and pseudo-labels that are generated based on the complete image by an exponential moving average teacher. To minimize the consistency loss, the network has to learn to infer the predictions of the masked regions from their context. Due to its simple and universal concept, MIC can be integrated into various UDA methods across different visual recognition tasks such as image classification, semantic segmentation, and object detection. MIC significantly improves the state-of-the-art performance across the different recognition tasks for synthetic-to-real, day-to-nighttime, and clear-to-adverse-weather UDA. For instance, MIC achieves an unprecedented UDA performance of 75.9 mIoU and 92.8% on GTA-to-Cityscapes and VisDA-2017, respectively, which corresponds to an improvement of +2.1 and +3.0 percent points over the previous state of the art. The implementation is available at https://github.com/lhoyer/MIC.
translated by 谷歌翻译
Improving model's generalizability against domain shifts is crucial, especially for safety-critical applications such as autonomous driving. Real-world domain styles can vary substantially due to environment changes and sensor noises, but deep models only know the training domain style. Such domain style gap impedes model generalization on diverse real-world domains. Our proposed Normalization Perturbation (NP) can effectively overcome this domain style overfitting problem. We observe that this problem is mainly caused by the biased distribution of low-level features learned in shallow CNN layers. Thus, we propose to perturb the channel statistics of source domain features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training. We further explore the style-sensitive channels for effective style synthesis. Normalization Perturbation only relies on a single source domain and is surprisingly effective and extremely easy to implement. Extensive experiments verify the effectiveness of our method for generalizing models under real-world domain shifts.
translated by 谷歌翻译
预测交通参与者的多模式未来行为对于机器人车辆做出安全决策至关重要。现有作品探索以直接根据潜在特征预测未来的轨迹,或利用密集的目标候选者来识别代理商的目的地,在这种情况下,由于所有运动模式均来自相同的功能,而后者的策略具有效率问题,因此前者策略的收敛缓慢,因为其性能高度依赖关于候选目标的密度。在本文中,我们提出了运动变压器(MTR)框架,该框架将运动预测模拟为全球意图定位和局部运动改进的联合优化。 MTR不使用目标候选者,而是通过采用一系列可学习的运动查询对来结合空间意图。每个运动查询对负责特定运动模式的轨迹预测和完善,这可以稳定训练过程并促进更好的多模式预测。实验表明,MTR在边际和联合运动预测挑战上都达到了最新的性能,在Waymo Open Motion DataSet排行榜上排名第一。代码将在https://github.com/sshaoshuai/mtr上找到。
translated by 谷歌翻译
在本报告中,我们介绍了2022 Waymo Open DataSet挑战中运动预测轨迹的第一名解决方案。我们为多模式运动预测提出了一个新型的运动变压器框架,该框架引入了一组新型运动查询对,用于通过共同执行意图定位和迭代运动改进来产生更好的多模式未来轨迹。采用了一种简单的模型合奏策略,并采用了非最大抑制作用,以进一步提高最终性能。我们的方法在2022 Waymo打开数据集挑战的运动预测排行榜上取得了第一名,优于其他利润率的其他方法。代码将在https://github.com/sshaoshuai/mtr上找到。
translated by 谷歌翻译
目前缺乏利用对象关系的目前有效的基于LIDAR的检测框架,这些框架自然而然地以空间和时间的方式存在。为此,我们引入了一个简单,高效且有效的两阶段检测器,称为RET3D。 RET3D的核心是利用新颖的框架内和框架间关系模块,以相应地捕获空间和时间关系。更具体地说,框内关系模块(Intrarm)将框架内对象封装到稀疏图中,从而使我们能够通过有效的消息传递来完善对象特征。另一方面,框架间关系模块(Interm)密集地将每个对象动态地连接到相应的跟踪序列中,并利用此类时间信息以通过轻量级变压器网络有效地增强其表示形式。我们使用基于中心的或基于锚的探测器实例化Intram和Interm的新颖设计,并在Waymo Open数据集(WOD)上对其进行评估。由于额外的额外开销可忽略不计,RET3D实现了最先进的性能,就1级1和2级MAPH指标而言,在车辆检测方面分别比最近的竞争对手高出5.5%和3.2%。
translated by 谷歌翻译
除标准摄像机外,自动驾驶汽车通常还包括多个其他传感器,例如激光雷达和雷达,这些传感器有助于获取更丰富的信息以感知驾驶场景的内容。尽管最近的几项作品着重于通过使用特定于检查设置的架构组件融合某些传感器,例如相机,镜头或相机和雷达,但文献中缺少了通用和模块化传感器融合体系结构。在这项工作中,我们专注于2D对象检测,这是在2D图像域上定义的基本高级任务,并提出了HRFUSER,这是一种多分辨率的传感器融合体系结构,可直接扩展到任意数量的输入模式。 HRFUSER的设计基于用于仅图像密集预测的最新高分辨率网络,并结合了一种新型的多窗口交叉注意区块,作为在多种分辨率下进行多种模态融合的手段。即使单独的相机为2D检测提供了非常有用的功能,我们通过对Nuscenes的广泛实验进行了证明,并通过FOG数据集查看,我们的模型有效地利用了其他模态的互补功能,从而实质上改善了相机性能,并始终如一地超过了更胜过摄影机的状态表现。在正常情况下和不利条件下,用于2D检测的ART融合方法。源代码将公开可用。
translated by 谷歌翻译
本文介绍了DGNET,这是一个新颖的深层框架,可利用对象梯度监督的伪装对象检测(COD)。它将任务分为两个连接的分支,即上下文和纹理编码器。必不可少的连接是梯度诱导的过渡,代表上下文和纹理特征之间的软组。从简单但高效的框架中受益,DGNET以很大的利润优于现有的最新COD模型。值得注意的是,我们的高效版本DGNET-S实时运行(80 fps),并获得与尖端模型JCSOD-CVPR $ _ {21} $相当的结果,只有6.82%的参数。应用程序结果还表明,所提出的DGNET在息肉分割,缺陷检测和透明对象分割任务中表现良好。代码将在https://github.com/gewelsji/dgnet上提供。
translated by 谷歌翻译